Art of Assembly: Chapter Eight

Art of Assembly Language: Chapter Eight

Chapter Eight - MASM: Directives & Pseudo-Opcodes
8.0 - Chapter Overview
8.1 - Assembly Language Statements
8.2 - The Location Counter
8.3 - Symbols
8.4 - Literal Constants
8.4.1 - Integer Constants
8.4.2 - String Constants
8.4.3 - Real Constants
8.4.4 - Text Constants
8.5 - Declaring Manifest Constants Using Equates
8.6 - Processor Directives
8.7 - Procedures
8.8 - Segments
8.8.1 - Segment Names
8.8.2 - Segment Loading Order
8.8.3 - Segment Operands
8.8.3.1 - The ALIGN Type
8.8.3.2 - The COMBINE Type
8.8.4 - The CLASS Type
8.8.5 - The Read-only Operand
8.8.6 - The USE16, USE32, and FLAT Options
8.8.7 - Typical Segment Definitions
8.8.8 - Why You Would Want to Control the Loading Order
8.8.9 - Segment Prefixes
8.8.10 - Controlling Segments with the ASSUME Directive
8.8.11 - Combining Segments: The GROUP Directive
8.8.12 - Why Even Bother With Segments?
8.9 - The END Directive
8.10 - Variables
8.11 - Label Types
8.11.1 - How to Give a Symbol a Particular Type
8.11.2 - Label Values
8.11.3 - Type Conflicts
8.12 - Address Expressions
8.12.1 - Symbol Types and Addressing Modes
8.12.2 - Arithmetic and Logical Operators
8.12.3 - Coercion
8.12.4 - Type Operators
8.12.5 - Operator Precedence
8.13 - Conditional Assembly
8.13.1 - IF Directive
8.13.2 - IFE directive
8.13.3 - IFDEF and IFNDEF
8.13.4 - IFB, IFNB
8.13.5 - IFIDN, IFDIF, IFIDNI, and IFDIFI
8.14 - Macros
8.14.1 - Procedural Macros
8.14.2 - Macros vs. 80x86 Procedures
8.14.3 - The LOCAL Directive
8.14.4 - The EXITM Directive
8.14.5 - Macro Parameter Expansion and Macro Operators
8.14.6 - A Sample Macro to Implement For Loops
8.14.7 - Macro Functions
8.14.8 - Predefined Macros, Macro Functions, and Symbols
8.14.9 - Macros vs. Text Equates
8.14.10 - Macros: Good and Bad News
8.15 - Repeat Operations
8.16 - The FOR and FORC Macro Operations
8.17 - The WHILE Macro Operation
8.18 - Macro Parameters
8.19 - Controlling the Listing
8.19.1 - The ECHO and %OUT Directives
8.19.2 - The TITLE Directive
8.19.3 - The SUBTTL Directive
8.19.4 - The PAGE Directive
8.19.5 - The .LIST, .NOLIST, and .XLIST Directives
8.19.6 - Other Listing Directives
8.20 - Managing Large Programs
8.20.1 - The INCLUDE Directive
8.20.2 - The PUBLIC, EXTERN, and EXTRN Directives
8.20.3 - The EXTERNDEF Directive
8.21 - Make Files
8.22 - Sample Program
8.22.1 - EX8.MAK
8.22.2 - Matrix.A
8.22.3 - EX8.ASM
8.22.4 - GETI.ASM
8.22.5 - GetArray.ASM
8.22.6 - XProduct.ASM

Copyright 1996 by Randall Hyde

All rights reserved.

Duplication other than for immediate display through a browser is prohibited by U.S. Copyright Law.

This material is provided on-line as a beta-test of this text. It is for the personal use of the reader only. If you are interested in using this material as part of a course, please contact

rhyde@cs.ucr.edu

Supporting software and other materials are available via anonymous ftp from ftp.cs.ucr.edu. See the "/pub/pc/ibmpcdir" directory for details. You may also download the material from "Randall Hyde's Assembly Language Page" at URL:

http://webster.ucr.edu

Notes:

This document does not contain the laboratory exercises, programming assignments, exercises, or chapter summary. These portions were omitted for several reasons: either they wouldn't format properly, they contained hyperlinks that were too much work to resolve, they were under constant revision, or they were not included for security reasons. Such omission should have very little impact on the reader interested in learning this material or evaluating this document.

This document was prepared using Harlequin's Web Maker 2.2 and Quadralay's Webworks Publisher. Since HTML does not support the rich formatting options available in Framemaker, this document is only an approximation of the actual chapter from the textbook.

If you are absolutely dying to get your hands on a version other than HTML, you might consider having the UCR Printing a Reprographics Department run you off a copy on their Xerox machines. For details, please read the following EMAIL message I received from the Printing and Reprographics Department:

Hello Again Professor Hyde,

Dallas gave me permission to take orders for the Computer Science 13 Manuals. We would need to take charge card orders. The only cards we take are: Master Card, Visa, and Discover. They would need to send the name, numbers, expiration date, type of card, and authorization to charge $95.00 for the manual and shipping, also we should have their phone number in case the company has any trouble delivery. They can use my e-mail address for the orders and I will process them as soon as possible. I would assume that two weeks would be sufficient for printing, packages and delivery time.

I am open to suggestions if you can think of any to make this as easy as possible.

Thank You for your business,

Kathy Chapman, Assistant
Printing and Reprographics
University of California
Riverside
(909) 787-4443/4444

We are currently working on ways to publish this text in a form other than HTML (e.g., Postscript, PDF, Frameviewer, hard copy, etc.). This, however, is a low-priority project. Please do not contact Randall Hyde concerning this effort. When something happens, an announcement will appear on "Randall Hyde's Assembly Language Page." Please visit this WEB site at http://webster.ucr.edu for the latest scoop.

Art of Assembly Bug Report Submissions

Did you find an error in The Art of Assembly Language Programming? You can let me know by using the form below to report the error to me so that I can correct the error for the next beta version. Thank you.

The Submission Form

Please provide your name and e-mail address so I can contact you if I have any questions regarding your submission.

Chapter Eight MASM: Directives & Pseudo-Opcodes

Statements like mov ax,0 and add ax,bx are meaningless to the microprocessor. As arcane as these statements appear, they are still human readable forms of 80x86 instructions. The 80x86 responds to commands like B80000 and 03C3. An assembler is a program that converts strings like mov ax,0 to 80x86 machine code like "B80000". An assembly language program consists of statements like mov ax,0. The assembler converts an assembly language source file to machine code - the binary equivalent of the assembly language program. In this respect, the assembler program is much like a compiler, it reads an ASCII source file from the disk and produces a machine language program as output. The major difference between a compiler for a high level language (HLL) like Pascal and an assembler is that the compiler usually emits several machine instructions for each Pascal statement. The assembler generally emits a single machine instruction for each assembly language statement.

Attempting to write programs in machine language (i.e., in binary) is not particularly bright. This process is very tedious, prone to mistakes, and offers almost no advantages over programming in assembly language. The only major disadvantage to assembly language over pure machine code is that you must first assemble and link a program before you can execute it. However, attempting to assemble the code by hand would take far longer than the small amount of time that the assembler takes to perform the conversion for you.

There is another disadvantage to learning assembly language. An assembler like Microsoft's Macro Assembler (MASM) provides a large number of features for assembly language programmers. Although learning about these features takes a fair amount of time, they are so useful that it is well worth the effort.

8.0 Chapter Overview

Like Chapter Six, much of the information in this chapter is reference material. Like any reference section, some knowledge is essential, other material is handy, but optional, and some material you may never use while writing programs. The following list outlines the information in this text. A "*" symbol marks the essential material. The "o" symbol marks the optional and lesser used subjects.

* Assembly language statement source format

o The location counter

* Symbols and identifiers

* Constants

* Procedure declarations

o Segments in an assembly language program

* Variables

* Symbol types

* Address expressions (later subsections contain advanced material)

o Conditional assembly

o Macros

o Listing directives

o Separate assembly

8.1 Assembly Language Statements

Assembly language statements in a source file use the following format:

{Label}         {Mnemonic       {Operand}}      {;Comment}

Each entity above is a field. The four fields above are the label field, the mnemonic field, the operand field, and the comment field.

The label field is (usually) an optional field containing a symbolic label for the current statement. Labels are used in assembly language, just as in HLLs, to mark lines as the targets of GOTOs (jumps). You can also specify variable names, procedure names, and other entities using symbolic labels. Most of the time the label field is optional, meaning a label need be present only if you want a label on that particular line. Some mnemonics, however, require a label, others do not allow one. In general, you should always begin your labels in column one (this makes your programs easier to read).

A mnemonic is an instruction name (e.g., mov, add, etc.). The word mnemonic means memory aid. mov is much easier to remember than the binary equivalent of the mov instruction! The braces denote that this item is optional. Note, however, that you cannot have an operand without a mnemonic.

The mnemonic field contains an assembler instruction. Instructions are divided into three classes: 80x86 machine instructions, assembler directives, and pseudo opcodes. 80x86 instructions, of course, are assembler mnemonics that correspond to the actual 80x86 instructions introduced in Chapter Six.

Assembler directives are special instructions that provide information to the assembler but do not generate any code. Examples include the

segment

directive, equ, assume, and end. These mnemonics are not valid 80x86 instructions. They are messages to the assembler, nothing else.

A pseudo-opcode is a message to the assembler, just like an assembler directive, however a pseudo-opcode will emit object code bytes. Examples of pseudo-opcodes include byte, word, dword, qword, and tbyte. These instructions emit the bytes of data specified by their operands but they are not true 80X86 machine instructions.

The operand field contains the operands, or parameters, for the instruction specified in the mnemonic field. Operands never appear on lines by themselves. The type and number of operands (zero, one, two, or more) depend entirely on the specific instruction.

The comment field allows you to annotate each line of source code in your program. Note that the comment field always begins with a semicolon. When the assembler is processing a line of text, it completely ignores everything on the source line following a semicolon.

Each assembly language statement appears on its own line in the source file. You cannot have multiple assembly language statements on a single line. On the other hand, since all the fields in an assembly language statement are optional, blank lines are fine. You can use blank lines anywhere in your source file. Blank lines are useful for spacing out certain sections of code, making them easier to read.

The Microsoft Macro Assembler is a free form assembler. The various fields of an assembly language statement may appear in any column (as long as they appear in the proper order). Any number of spaces or tabs can separate the various fields in the statement. To the assembler, the following two code sequences are identical:

______________________________________________________

                mov     ax, 0
                mov     bx, ax
                add     ax, dx
                mov     cx, ax

______________________________________________________

mov             ax,                     0
          mov bx,                      ax
    add               ax, dx
                        mov             cx, ax

______________________________________________________

The first code sequence is much easier to read than the second (if you don't think so, perhaps you should go see a doctor!). With respect to readability, the judicial use of spacing within your program can make all the difference in the world.

Placing the labels in column one, the mnemonics in column 17 (two tabstops), the operand field in column 25 (the third tabstop), and the comments out around column 41 or 49 (five or six tabstops) produces the best looking listings. Assembly language programs are hard enough to read as it is. Formatting your listings to help make them easier to read will make them much easier to maintain.

You may have a comment on the line by itself. In such a case, place the semicolon in column one and use the entire line for the comment, examples:

; The following section of code positions the cursor to the upper
; left hand position on the screen:

                mov     X, 0
                mov     Y, 0

; Now clear from the current cursor position to the end of the
; screen to clear the video display:

;               etc.

8.2 The Location Counter

Recall that all addresses in the 80x86's memory space consist of a segment address and an offset within that segment. The assembler, in the process of converting your source file into object code, needs to keep track of offsets within the current segment. The location counter is an assembler variable that handles this.

Whenever you create a segment in your assembly language source file (see segments later in this chapter), the assembler associates the current location counter value with it. The location counter contains the current offset into the segment. Initially (when the assembler first encounters a segment) the location counter is set to zero. When encountering instructions or pseudo-opcodes, MASM increments the location counter for each byte written to the object code file. For example, MASM increments the location counter by two after encountering mov ax, bx since this instruction is two bytes long.

The value of the location counter varies throughout the assembly process. It changes for each line of code in your program that emits object code. We will use the term location counter to mean the value of the location counter at a particular statement before generating any code. Consider the following assembly language statements:

0 :             or      ah, 9
3 :             and     ah, 0c9h 
6 :             xor     ah, 40h 
9 :             pop     cx 
A :             mov     al, cl 
C :             pop     bp 
D :             pop     cx 
E :             pop     dx
F :             pop     ds 
10:             ret

The or, and, and xor instructions are all three bytes long; the mov instruction is two bytes long; the remaining instructions are all one byte long. If these instructions appear at the beginning of a segment, the location counter would be the same as the numbers that appear immediately to the left of each instruction above. For example, the or instruction above begins at offset zero. Since the or instruction is three bytes long, the next instruction (and) follows at offset three. Likewise,

and

is three bytes long, so xor follows at offset six, etc..

8.3 Symbols

Consider the jmp instruction for a moment. This instruction takes the form:

                jmp   target

Target is the destination address. Imagine how painful it would be if you had to actually specify the target memory address as a numeric value. If you've ever programmed in BASIC (where line numbers are the same thing as statement labels) you've experienced about 10% of the trouble you would have in assembly language if you had to specify the target of a

jmp

by an address.

To illustrate, suppose you wanted to jump to some group of instructions you've yet to write. What is the address of the target instruction? How can you tell until you've written every instruction before the target instruction? What happens if you change the program (remember, inserting and deleting instructions will cause the location counter values for all the following instructions within that segment to change). Fortunately, all these problems are of concern only to machine language programmers. Assembly language programmers can deal with addresses in a much more reasonable fashion - by using symbolic addresses.

A symbol, identifier, or label , is a name associated with some particular value. This value can be an offset within a segment, a constant, a string, a segment address, an offset within a record, or even an operand for an instruction. In any case, a label provides us with the ability to represent some otherwise incomprehensible value with a familiar, mnemonic, name.

A symbolic name consists of a sequence of letters, digits, and special characters, with the following restrictions:

A symbol cannot begin with a numeric digit.
A name can have any combination of upper and lower case alphabetic characters. The assembler treats upper and lower case equivalently.
A symbol may contain any number of characters, however only the first 31 are used. The assembler ignores all characters beyond the 31st.
The _, $, ?, and @ symbols may appear anywhere within a symbol. However, $ and ? are special symbols; you cannot create a symbol made up solely of these two characters.
A symbol cannot match any name that is a reserved symbol. The following symbols are reserved:

%out            .186                    .286                    .286P
.287            .386                    .386P                   .387
.486            .486P                   .8086                   .8087
.ALPHA          .BREAK                  .CODE                   .CONST
.CREF           .DATA                   .DATA?                  .DOSSEG
.ELSE           .ELSEIF                 .ENDIF                  .ENDW
.ERR            .ERR1                   .ERR2                   .ERRB
.ERRDEF         .ERRDIF                 .ERRDIFI                        .ERRE
.ERRIDN         .ERRIDNI                        .ERRNB                  .ERRNDEF
.ERRNZ          .EXIT                   .FARDATA                .FARDATA?
.IF             .LALL                   .LFCOND                 .LIST
.LISTALL        .LISTIF                 .LISTMACRO              .LISTMACROALL
.MODEL          .MSFLOAT                .NO87                   .NOCREF
.NOLIST         .NOLISTIF               .NOLISTMACRO            .RADIX
.REPEAT         .UNTIL                  .SALL                   .SEQ
.SFCOND         .STACK                  .STARTUP                .TFCOND
.UNTIL          .UNTILCXZ               .WHILE                  .XALL
.XCREF          .XLIST                  ALIGN                   ASSUME
BYTE            CATSTR                  COMM                    COMMENT
DB              DD                      DF                      DOSSEG
DQ              DT                      DW                      DWORD
ECHO            ELSE                    ELSEIF                  ELSEIF1
ELSEIF2         ELSEIFB                 ELSEIFDEF               ELSEIFDEF
ELSEIFE         ELSEIFIDN               ELSEIFNB                ELSEIFNDEF
END             ENDIF                   ENDM                    ENDP
ENDS            EQU                     EVEN                    EXITM
EXTERN          EXTRN                   EXTERNDEF               FOR
FORC            FWORD                   GOTO                    GROUP
IF              IF1                     IF2                     IFB
IFDEF           IFDIF                   IFDIFI                  IFE
IFIDN           IFIDNI                  IFNB                    IFNDEF
INCLUDE         INCLUDELIB              INSTR                   INVOKE
IRP             IRPC                    LABEL                   LOCAL
MACRO           NAME                    OPTION                  ORG
PAGE            POPCONTEXT              PROC                    PROTO
PUBLIC          PURGE                   PUSHCONTEXT             QWORD
REAL4           REAL8                   REAL10                  RECORD
REPEAT          REPT                    SBYTE                   SDWORD
SEGMENT         SIZESTR                 STRUC                   STRUCT
SUBSTR          SUBTITLE                SUBTTL                  SWORD
TBYTE           TEXTEQU                 TITLE                   TYPEDEF
UNION           WHILE                   WORD

In addition, all valid 80x86 instruction names and register names are reserved as well. Note that this list applies to Microsoft's Macro Assembler version 6.0. Earlier versions of the assembler have fewer reserved words. Later versions may have more.

Some examples of valid symbols include:

        L1              Bletch          RightHere
        Right_Here      Item1           __Special
        $1234           @Home           $_@1 
        Dollar$         WhereAmI?       @1234

$1234 and @1234 are perfectly valid, strange though they may seem.

Some examples of illegal symbols include:

1TooMany        - Begins with a digit. 
Hello.There     - Contains a period in the middle of the symbol. 
$               - Cannot have $ or ? by itself. 
LABEL           - Assembler reserved word. 
Right Here      - Symbols cannot contain spaces.
Hi,There        - or other special symbols besides _, ?, $, and @.

Symbols, as mentioned previously, can be assigned numeric values (such as location counter values), strings, or even whole operands. To keep things straightened out, the assembler assigns a type to each symbol. Examples of types include near, far, byte, word, double word, quad word, text, and strings. How you declare labels of a certain type is the subject of much of the rest of this chapter. For now, simply note that the assembler always assigns some type to a label and will tend to complain if you try to use a label at some point where it does not allow that type of label.

8.4 Literal Constants

The Microsoft Macro Assembler (MASM) is capable of processing five different types of constants: integers, packed binary coded decimal integers, real numbers, strings, and text. In this chapter we'll consider integers, reals, strings, and text only. For more information about packed BCD integers please consult the Microsoft Macro Assembler Programmer's Guide.

A literal constant is one whose value is implicit from the characters that make up the constant. Examples of literal constants include:

123
3.14159
"Literal String Constant"
0FABCh
'A'
<Text Constant>

Except for the last example above, most of these literal constants should be reasonably familiar to anyone who has written a program in a high level language like Pascal or C++. Text constants are special forms of strings that allow textual substitution during assembly.

A literal constant's representation corresponds to what we would normally expect for its "real world value." Literal constants are also known as non symbolic constants since they use the value's actual representation, rather than some symbolic name, within your program. MASM also lets you define symbolic, or manifest, constants in a program, but more on that later.

8.4.1 Integer Constants

An integer constant is a numeric value that can be specified in binary, decimal, or hexadecimal. The choice of the base (or radix) is up to you. The following table shows the legal digits for each radix:

Digits Used With Each Radix
Name	Base	Valid Digits
Binary	2	0 1
Decimal	10	0 1 2 3 4 5 6 7 8 9
Hexadecimal	16	0 1 2 3 4 5 6 7 8 9 A B C D E F

To differentiate between numbers in the various bases, you use a suffix character. If you terminate a number with a "b" or "B", then MASM assumes that it is a binary number. If it contains any digits other than zero or one the assembler will generate an error. If the suffix is "t", "T", "d" or "D", then the assembler assumes that the number is a decimal (base 10) value. A suffix of "h" or "H" will select the hexadecimal radix.

All integer constants must begin with a decimal digit, including hexadecimal constants. To represent the value "FDED" you must specify 0FDEDh. The leading decimal digit is required by the assembler so that it can differentiate between symbols and numeric constants; remember, "FDEDh" is a perfectly valid symbol to the Microsoft Macro Assembler.

Examples:

                0F000h          12345d          0110010100b
                1234h           100h            08h

If you do not specify a suffix after your numeric constants, the assembler uses the current default radix. Initially, the default radix is decimal. Therefore, you can usually specify decimal values without the trailing "D" character. The radix assembler directive can be used to change the default radix to some other base. The .radix instruction takes the following form:

                .radix  base    ;Optional comment

Base is a decimal value between 2 and 16.

The .radix statement takes effect as soon as MASM encounters it in the source file. All the statements before the .radix statement will use the previous default base for numeric constants. By sprinkling multiple .radix instructions throughout your source file, you can switch the default base amongst several values depending upon what's most convenient at each point in your program.

Generally, decimal is fine as the default base so the .radix instruction doesn't get used much. However, faced with entering a gigantic table of hexadecimal values, you can save a lot of typing by temporarily switching to base 16 before the table and switching back to decimal after the table. Note: if the default radix is hexadecimal, you should use the "T" suffix to denote decimal values since MASM will confuse the "D" suffix with a hexadecimal digit.

8.4.2 String Constants

A string constant is a sequence of characters surrounded by apostrophes or quotation marks.

Examples:

                "This is a string"
                 'So is this'

You may freely place apostrophes inside string constants enclosed by quotation marks and vice versa. If you want to place an apostrophe inside a string delimited by apostrophes, you must place a pair of apostrophes next to each other in the string, e.g.,

                'Doesn''t this look weird?'

Quotation marks appearing within a string delimited by quotes must also be doubled up, e.g.,

 "Microsoft claims ""Our software is very fast.""  Do you believe them?"

Although you can double up apostrophes or quotes as shown in the examples above, the easiest way to include these characters in a string is to use the other character as the string delimiter:

                "Doesn't this look weird?"
'Microsoft claims "Our software is very fast." Do you believe them?'

The only time it would be absolutely necessary to double up quotes or apostrophes in a string is if that string contained both symbols. This rarely happens in real programs.

Like the C and C++ programming languages, there is a subtle difference between a character value and a string value. A single character (that is, a string of length one) may appear anywhere MASM allows an integer constant or a string. If you specify a character constant where MASM expects an integer constant, MASM uses the ASCII code of that character as the integer value. Strings (whose length is greater than one) are allowed only within certain contexts.

8.4.3 Real Constants

Within certain contexts, you can use floating point constants. MASM allows you to express floating point constants in one of two forms: decimal notation or scientific notation. These forms are quite similar to the format for real numbers that Pascal, C, and other HLLs use.

The decimal form is just a sequence of digits containing a decimal point in some position of the number:

        1.0     3.14159         625.25          -128.0          0.5

Scientific notation is also identical to the form used by various HLLs:

        1e5     1.567e-2                -6.02e-10               5.34e+12

The exact range of precision of the numbers depend on your particular floating point package. However, MASM generally emits binary data for the above constants that is compatible with the 80x87 numeric coprocessors. This form corresponds to the numeric format specified by the IEEE standard for floating point values. In particular, the constant 1.0 is not the binary equivalent of the integer one.

8.4.4 Text Constants

Text constants are not the same thing as string constants. A textual constant substitutes verbatim during the assembly process. For example, the characters 5[bx] could be a textual constant associated with the symbol VAR1. During assembly, an instruction of the form

mov
ax, VAR1

would be converted to the instruction mov ax, 5[bx].

Textual equates are quite useful in MASM because MASM often insists on long strings of text for some simple assembly language operands. Using text equates allows you to simplify such operands by substituting some string of text for a single identifier in a statement.

A text constant consists of a sequence of characters surrounded by the "<" and ">" symbols. For example the text constant

5[bx]

would normally be written as <5[bx]>. When the text substitution occurs, MASM strips the delimiting "<" and ">" characters.

8.5 Declaring Manifest Constants Using Equates

A manifest constant is a symbol name that represents some fixed quantity during the assembly process. That is, it is a symbolic name that represents some value. Equates are the mechanism MASM uses to declare symbolic constants. Equates take three basic forms:

symbol          equ     expression
symbol          =       expression
symbol          textequ expression

The expression operand is typically a numeric expression or a text string. The symbol is given the value and type of the expression. The equ and "=" directives have been with MASM since the beginning. Microsoft added the textequ directive starting with MASM 6.0.

The purpose of the "=" directive is to define symbols that have an integer (or single character) quantity associated with them. This directive does not allow real, string, or text operands. This is the primary directive you should use to create numeric symbolic constants in your programs. Some examples:

NumElements     =       16
                 .
                 .
                 .
Array           byte    NumElements dup (?)
                 .
                 .
                 .
                mov     cx, NumElements
                mov     bx, 0
ClrLoop:                mov     Array[bx], 0
                inc     bx
                loop    ClrLoop

The textequ directive defines a text substitution symbol. The expression in the operand field must be a text constant delimited with the "<" and ">" symbols. Whenever MASM encounters the symbol within a statement, it substitutes the text in the operand field for the symbol. Programmers typically use this equate to save typing or to make some code more readable:

Count           textequ <6[bp]>
DataPtr         textequ <8[bp]>
                 .
                 .
                 .
                les     bx, DataPtr     ;Same as les bx, 8[bp]
                mov     cx, Count       ;Same as mov cx, 6[bp]
                mov     al, 0
ClrLp:          mov     es:[bx], al
                inc     bx
                loop    ClrLp

Note that it is perfectly legal to equate a symbol to a blank operand using an equate like the following:

BlankEqu        textequ <>

The purpose of such an equate will become clear in the sections on conditional assembly and macros.

The equ directive provides almost a superset of the capabilities of the "=" and textequ directives. It allows operands that are numeric, text, or string literal constants. The following are all legal uses of the equ directive:

One             equ     1
Minus1          equ     -1
TryAgain        equ     'Y'
StringEqu       equ     "Hello there"
TxtEqu          equ     <4[si]>
                 .
                 .
                 .
HTString        byte    StringEqu       ;Same as HTString equ "Hello there"
                 .
                 .
                 .
                mov     ax, TxtEqu      ;Same as mov ax, 4[si]
                 .
                 .
                 .
                mov     bl, One         ;Same as mov bl, 1
                cmp     al, TryAgain    ;Same as cmp al, 'Y'

Manifest constants you declare with equates help you parameterize a program. If you use the same value, string, or text, multiple times within a program, using a symbolic equate will make it very easy to change that value in future modifications to the program. Consider the following example:

Array           byte    16 dup (?)
                 .
                 .
                 .
                mov     cx, 16
                mov     bx, 0
ClrLoop:        mov     Array[bx], 0
                inc     bx
                loop    ClrLoop

If you decide you want Array to have 32 elements rather than 16, you will need to search throughout your program an locate every reference to this data and adjust the literal constants accordingly. Then there is the possibility that you missed modifying some particular section of code, introducing a bug into your program. On the other hand, if you use the NumElements symbolic constant shown earlier, you would only have to change a single statement in your program, reassemble it, and you would be in business; MASM would automatically update all references using NumElements.

MASM lets you redefine symbols declared with the "=" directive. That is, the following is perfectly legal:

SomeSymbol      =       0
                .
                .
                .
SomeSymbol      =       1

Since you can change the value of a constant in the program, the symbol's scope (where the symbol has a particular value) becomes important. If you could not redefine a symbol, one would expect the symbol to have that constant value everywhere in the program. Given that you can redefine a constant, a symbol's scope cannot be the entire program. The solution MASM uses is the obvious one, a manifest constant's scope is from the point it is defined to the point it is redefined. This has one important ramification - you must declare all manifest constants with the "=" directive before you use that constant. Of course, once you redefine a symbolic constant, the previous value of that constant is forgotten. Note that you cannot redefine symbols you declare with the textequ or equ directives.

8.6 Processor Directives

By default, MASM will only assemble instructions that are available on all members of the 80x86 family. In particular, this means it will not assemble instructions that are not available on the 8086 and 8088 microprocessors. By generating an error for non-8086 instructions, MASM prevents the accidental use of instructions that are not available on various processors. This is great unless, of course, you actually want to use those instructions available on processors beyond the 8086 and 8088. The processor directives let you enable the assembly of instructions available on later processors.

The processor directives are

.8086 .8087 .186 .286 .287

.286P .386 .387 .386P .486

.486P .586 .586P

None of these directives accept any operands.

The processor directives enable all instructions available on a given processor. Since the 80x86 family is upwards compatible, specifying a particular processor directive enables all instructions on that processor and all earlier processors as well.

The .8087, .287, and .387 directives activate the floating point instruction set for the given floating point coprocessors. However, the .8086 directive also enables the 8087 instruction set; likewise, .286 enables the 80287 instruction set and .386 enables the 80387 floating point instruction set. About the only purpose for these FPU (floating point unit) directives is to allow 80287 instructions with the 8086 or 80186 instruction set or 80387 instruction with the 8086, 80186, or 80286 instruction set.

The processor directives ending with a "P" allow assembly of privileged mode instructions. Privileged mode instructions are only useful to those writing operating systems, certain device drivers, and other advanced system routines. Since this text does not discuss privileged mode instructions, there is little need to discuss these privileged mode directives further.

The 80386 and later processors support two types of segments when operating in protected mode - 16 bit segments and 32 bit segments. In real mode, these processors support only 16 bit segments. The assembler must generate subtly different opcodes for 16 and 32 bit segments. If you've specified a 32 bit processor using .386, .486, or .586, MASM generates instructions for 32 bit segments by default. If you attempt to run such code in real mode under MS-DOS, you will probably crash the system. There are two solutions to this problem. The first is to specify use16 as an operand to each segment you create in your program. The other solution is slightly more practical, simply put the following statement after the 32 bit processor directive:

                option  segment:use16

This directive tells MASM to generate 16 bit segments by default, rather than 32 bit segments.

Note that MASM does not require an 80486 or Pentium processor if you specify the .486 or .586 directives. The assembler itself is written in 80386 code so you only need an 80386 processor to assemble any program with MASM. Of course, if you use 80486 or Pentium processor specific instructions, you will need an 80486 or Pentium processor to run the assembled code.

You can selectively enable or disable various instruction sets throughout your program. For example, you can turn on 80386 instructions for several lines of code and then return back to 8086 only instructions. The following code sequence demonstrates this:

                .386            ;Begin using 80386 instructions
                 .
                 .              ;This code can have 80386 instrs.
                 .
                .8086           ;Return back to 8086-only instr set.
                 .
                 .              ;This code can only have 8086 instrs.
                 .

It is possible to write a routine that detects, at run-time, what processor a program is actually running on. Therefore, you can detect an 80386 processor and use 80386 instructions. If you do not detect an 80386 processor, you can stick with 8086 instructions. By selectively turning 80386 instructions on in those sections of your program that executes if an 80386 processor is present, you can take advantage of the additional instructions. Likewise, by turning off the 80386 instruction set in other sections of your program, you can prevent the inadvertent use of 80386 instructions in the 8086-only portion of the program.

8.7 Procedures

Unlike HLLs, MASM doesn't enforce strict rules on exactly what constitutes a procedure. You can call a procedure at any address in memory. The first ret instruction encountered along that execution path terminates the procedure. Such expressive freedom, however, is often abused yielding programs that are very hard to read and maintain. Therefore, MASM provides facilities to declare procedures within your code. The basic mechanism for declaring a procedure is:

procname        proc    {NEAR or FAR}

            <statements>

procname        endp

As you can see, the definition of a procedure looks similar to that for a segment. One difference is that procname (that is the name of the procedure you're defining) must be a unique identifier within your program. Your code calls this procedure using this name, it wouldn't do to have another procedure by the same name; if you did, how would the program determine which routine to call?

Proc allows several different operands, though we will only consider three: the single keyword near, the single keyword far, or a blank operand field. MASM uses these operands to determine if you're calling this procedure with a near or

far

call instruction. They also determine which type of ret instruction MASM emits within the procedure. Consider the following two procedures:

NProc           proc    near
                mov     ax, 0
                ret
NProc           endp

FProc           proc    far
                mov     ax, 0FFFFH
                ret
FProc           endp

and:

                call    NPROC
                call    FPROC

The assembler automatically generates a three-byte (near) call for the first call instruction above because it knows that NProc is a near procedure. It also generates a five-byte (far) call instruction for the second call because FProc is a far procedure. Within the procedures themselves, MASM automatically converts all ret instructions to near or far returns depending on the type of routine.

Note that if you do not terminate a proc/endp section with a ret or some other transfer of control instruction and program flow runs into the endp directive, execution will continue with the next executable instruction following the endp. For example, consider the following:

Proc1           proc
                mov     ax, 0
Proc1           endp

Proc2           proc
                mov     bx, 0FFFFH
                ret
Proc2           endp

If you call Proc1, control will flow on into Proc2 starting with the mov bx,0FFFFh instruction. Unlike high level language procedures, an assembly language procedure does not contain an implicit return instruction before the endp directive. So always be aware of how the proc/endp directives work.

There is nothing special about procedure declarations. They're a convenience provided by the assembler, nothing more. You could write assembly language programs for the rest of your life and never use the proc and endp directives. Doing so, however, would be poor programming practice. Proc and endp are marvelous documentation features which, when properly used, can help make your programs much easier to read and maintain.

MASM versions 6.0 and later treat all statement labels inside a procedure as local. That is, you cannot refer directly to those symbols outside the procedure. For more details, see "How to Give a Symbol a Particular Type" on page 385.

8.0 - Chapter Overview
8.1 - Assembly Language Statements
8.2 - The Location Counter
8.3 - Symbols
8.4 - Literal Constants
8.4.1 - Integer Constants
8.4.2 - String Constants
8.4.3 - Real Constants
8.4.4 - Text Constants
8.5 - Declaring Manifest Constants Using Equates
8.6 - Processor Directives
8.7 - Procedures
8.8 - Segments
8.8.1 - Segment Names
8.8.2 - Segment Loading Order
8.8.3 - Segment Operands
8.8.3.1 - The ALIGN Type
8.8.3.2 - The COMBINE Type
8.8.4 - The CLASS Type
8.8.5 - The Read-only Operand
8.8.6 - The USE16, USE32, and FLAT Options
8.8.7 - Typical Segment Definitions
8.8.8 - Why You Would Want to Control the Loading Order
8.8.9 - Segment Prefixes
8.8.10 - Controlling Segments with the ASSUME Directive
8.8.11 - Combining Segments: The GROUP Directive
8.8.12 - Why Even Bother With Segments?
8.9 - The END Directive
8.10 - Variables
8.11 - Label Types
8.11.1 - How to Give a Symbol a Particular Type
8.11.2 - Label Values
8.11.3 - Type Conflicts
8.12 - Address Expressions
8.12.1 - Symbol Types and Addressing Modes
8.12.2 - Arithmetic and Logical Operators
8.12.3 - Coercion
8.12.4 - Type Operators
8.12.5 - Operator Precedence
8.13 - Conditional Assembly
8.13.1 - IF Directive
8.13.2 - IFE directive
8.13.3 - IFDEF and IFNDEF
8.13.4 - IFB, IFNB
8.13.5 - IFIDN, IFDIF, IFIDNI, and IFDIFI
8.14 - Macros
8.14.1 - Procedural Macros
8.14.2 - Macros vs. 80x86 Procedures
8.14.3 - The LOCAL Directive
8.14.4 - The EXITM Directive
8.14.5 - Macro Parameter Expansion and Macro Operators
8.14.6 - A Sample Macro to Implement For Loops
8.14.7 - Macro Functions
8.14.8 - Predefined Macros, Macro Functions, and Symbols
8.14.9 - Macros vs. Text Equates
8.14.10 - Macros: Good and Bad News
8.15 - Repeat Operations
8.16 - The FOR and FORC Macro Operations
8.17 - The WHILE Macro Operation
8.18 - Macro Parameters
8.19 - Controlling the Listing
8.19.1 - The ECHO and %OUT Directives
8.19.2 - The TITLE Directive
8.19.3 - The SUBTTL Directive
8.19.4 - The PAGE Directive
8.19.5 - The .LIST, .NOLIST, and .XLIST Directives
8.19.6 - Other Listing Directives
8.20 - Managing Large Programs
8.20.1 - The INCLUDE Directive
8.20.2 - The PUBLIC, EXTERN, and EXTRN Directives
8.20.3 - The EXTERNDEF Directive
8.21 - Make Files
8.22 - Sample Program
8.22.1 - EX8.MAK
8.22.2 - Matrix.A
8.22.3 - EX8.ASM
8.22.4 - GETI.ASM
8.22.5 - GetArray.ASM
8.22.6 - XProduct.ASM

Art of Assembly: Chapter Eight - 26 SEP 1996

[Next] [Art of Assembly][Randall Hyde]